Comparison of Different Modelling Approaches of Trihalomethanes (THMs) Formation in Drinking Water

Sanjay Verma, Ashok Sharma, Sarita Sharma and Rajan K. Priyadarshi*

Department of Chemical Engineering, Ujjain Engineering Collage, Ujjain (M.P.)

*Corresponding Author E-mail: rajan_priyadarshi@yahoo.co.in

ABSTRACT:

In drinking water treatment plants, chlorination is done for the disinfection. On one hand chlorine addition provides residual protection against recontamination of water in the distribution with pathogenic micro-organism but at the same time it reacts with natural organic matter (NOM) present in water to form certain by-products, which are harmful in long term consumption. Trihalomethanes (THMs) is one such group of by products which contains CHCl₃, CHCl₂Br, CHClBr₂and CHBr₃. In drinking water formation of THMs is a function of pH, temperature, reaction time, total organic carbon (TOC), chlorine dose etc. Choosing important parameters to model the formation of THMs is useful alternative to chemical analysis. This paper presents the application of two empirical models for simulating and forecasting THMs concentrations within drinking water. The first is a linear autoregressive model with external inputs, known as ARX; the second is a non-linear artificial neural network (ANN) model. The results demonstrate the potential of an ANN model, which has a unique ability to detect non-linear complex relationships between data. In evaluating all the given data, simulation results show a similar performance for the linear and non-linear models. However, for specific water treatment conditions (very high and very low chlorine doses, pH and TOC), the ANN model gives better predictions than the ARX model.

KEYWORDS: Residual chlorine; Drinking water; Neural networks; ARX

INTRODUCTION:

Chlorine is the most common disinfectant used in the drinking water treatment process throughout the world. When it is applied to the process, its aim is to eliminate pathogens. When leaving the treatment plant, water requires a residual amount of chlorine to ensure its microbiological stability during transportation throughout the distribution system. However, chemical reaction of chlorine with organic compounds in the treated water favours the formation of disinfection by-products (DBPs), some being suspected to be carcinogenic (Rook, 1974; Reuber, 1979). During disinfection, the chlorine reacts with natural organic matter (NOM) present in the water. The reaction produces chlorination by-products including trihalomethanes (THMs): chloroform (CHCl3), bromodichloromethane (CHCl2Br), dibromochloromethane (CHClBr2) and bromoform (CHBr3). The reaction is reflected in the following equation:

NOM + Chlorine → THMs + Other products

The more added chlorine and the greater the organic matter contained in the water, the higher will be the potential for formation of DBPs. Reactions between chlorine (hypochlorous acid and hypochlorite ion) and natural organic matter (NOM) in water form a wide range of halogenated and non-halogenated compounds (de Leer et al., 1985; Stevens et al., 1989). The halogen-containing compounds appearing in the highest concentrations are small organic molecules, such as the trihalomethanes (THMs) (Reckhow et al., 1990).

To periodically adjust chlorine doses (by increasing or decreasing), operators generally use the information about residual chlorine at strategic points within the treatment plant or in the distribution system. Frequently, this information is available from on line chlorine monitors. However, such information involves a time delay due to the travel time of water between the dosing point and the monitored strategic point. To counter balance the inconvenience of such a delay, modelling strategies have been applied.

The aim of this paper is to evaluate the ability of two empirical modelling approaches to forecast THMs evolution in two drinking water systems. More particularly, the purpose consists of comparing the forecasting capabilities of a linear and a non-linear model. The first is the classical model: the autoregressive model with external inputs (ARX). The non-linear model is an artificial neural network (ANN). This will allow the assessment of the benefits of using non-linear models when simulating the decay of residual chlorine in water treatment.

MATERIALS AND METHODS:

In this paper, completely empirical approaches are proposed. The approaches aim to empirically relate input variables (factors affecting the THMs formation) with an output variable (resulting THMs, which is the variable to be forecasted) using linear and non-linear time-dependent model structures. Parameters of the models are estimated from representative data describing the process. In this investigation, the process describes the THMs in two drinking water sources. Successful application of such empirical modelling approaches depends on the availability of continuous representative information about the factors related to residual chlorine evolution.

Empirical linear model:

The ARX linear model has been broadly applied to system identification problems (Soderstrom and Stoica, 1989; Ljung, 1991). The selected ARX model for the present application has the following form:

(1)

where y_t denotes the variable to be modelled, that is, the forecasted concentration of THMs at one step in the future; u₁ to u_n denote the set of exogenous variables, or the factors which affect chlorine decay; e_t represents the unmodelled information; p and q denote the orders for the autoregressive terms for the modelled and the exogenous variables, respectively; d denotes the delay in time for the exogenous variables. Finally, A_i and B_1j to B_nj are the parameters to be estimated. This model structure appeared to be adequate because, by including exogenous variables, the phenomenology of the process is likely to be respected. In addition, the inclusion of an autoregressive term allows the temporal dynamics of chlorine evolution in the system to be taken into account. This also allows for intrinsic consideration of the influence of factors for which information is not available.

Empirical non-linear model:

The non-linear model being used is an artificial neural network (ANN). This type of connectionist model has been selected over other non-linear structures because of its recognized capacity for establishing complex relationships between input and output sets of data and for its ability to generalize (Hammerstrom, 1993; Rumelhart et al., 1994). An ANN is comprised of an input layer representing the relevant process variables (the parameters which affect evolution of THMs), one or more hidden layers for processing information and one output layer representing the variables to be modelled (forecasted THMs). Each layer is fully connected by links; each link is characterized by a weight which represents the parameter to be estimated. In this study, three-layer ANNs are considered (Fig. 1). The input layer consists of an autoregressive structure equivalent to that of the ARX model. The number of elements in this layer depends on the number n of exogenous variables required to simulate the process. The output layer consists of one element which denotes the THMs at one step in the future. The size of the hidden layer, h, must also be determined through experimentation following a training/test process of the model. The non-linearity of such a model is due to a transfer function f which operates within the elements of the hidden and output layers. The most common transfer function is a sigmoid and is defined as follows:

(2)

Fig- 1. ANN Structure for THMs Modelling.

A back propagation neural network (BPNN) is the most commonly utilized algorithms and the most useful for neural networks. Developed initially by Werbos (1974), a BPNN algorithm has lower memories requirements than most algorithms, and generally reaches an acceptable level of error quite rapidly. A simple BPNN network has a feedback structure; signal flow from outputs. Forward through any hidden unit, finely reaching the inputs unit (Fausett, 1994; Haykin, 1994; Patierson, 1996). The general structure of a BPNN is well known and can be found in numerous publications (Rumelhart et al., 1986; Fausett, 1994; Master, 1995).

There are two approaches to training - supervised and unsupervised. Supervised training involves a mechanism of providing the network with the desired output either by manually "grading" the network's performance or by providing the desired outputs with the inputs. Unsupervised training is where the network has to make sense of the inputs without outside help.

Table-1. Range of Input and Output parameters from Field

Sr. No.	Water quality parameter	Source-1		Source-2
Sr. No.	Water quality parameter	Mean	Range	Mean	Range
1	pH	7.2	6.0-9.0	7.4	6.1-8.2
2	Turbidity (NTU)	3.85	1.36-16.0	0.68	0.21-2.83
3	.Ammonia (mg/l)	0.0096	0.00-0.80	0.009	0.00-0.0835
4	TOC (mg/L)	2.16	1.244-3.731	2.223	1.705-3.330
5	Tempreture (oc)	29.1	26.5-32.2	29.6	28.0-32.2
6	Cl2 Dosage (mg/l)	4.262	3.481-4.773	2.493	2.260-2.730
7	Cl2 residue (mg/l)	1.926	0.500-4.000	1.103	0.200-2.000
8	THM (mg/l)	1.926	0.0024-0.1204	0.07	0.0077-0.1365

In supervised training, both the inputs and the outputs are provided. The network then processes the inputs and compares its resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights which control the network. This process occurs over and over as the weights are continually tweaked. The set of data which enables the training is called the "training set." During the training of a network the same set of data is processed many times as the connection weights are ever refined.

There have been a number of applications of data-driven methods to water distribution modelling. These include use of an Artificial Neural Network (ANN) for pipe condition assessment in water distribution systems (Geem, 2003) an ANN and multiple linear regression to predict THM formation (Rodriguez et al., 2003), and the use of regression, time series analysis, expert systems and ANNs to forecast short term water demands (Jain, 2002). There have been a limited number of studies using data-driven methods to predict chlorine concentrations in a water distribution, including the use of ANNs models to predict the chlorine decay in a storage tank (Serodes et al., 1996 and Rodriguez et al., 1999), and in the distribution network (Rodriguez et al., 1999).

Data Selection:

Two sets of data were chosen from two different drinking water sources (Abdullah et al., 2003) for model prediction having pH, TOC, chlorine dose and residuals chlorine as independent variables and THMs as dependent variable. The range of source data was given below in table-1.

For both cases, the linear and non-linear modelling approaches are used to forecast concentrations of THMs in water treatment. Continuous monitoring of pH, TOC, chlorine dose and residuals chlorine in water treatment provides indispensable information for adjusting the disinfectant doses. Table-1 summarizes the information about source- 1 and 2 and the data used for modelling. The purpose of modelling is to forecast concentrations of THMs in both systems.

Water quality parameters such as total organic carbon, which directly describe organic matter contained in water, are rarely measured continuously in drinking water utilities. For table-1, turbidity is the only parameter that may reflect continuously the physical quality of the treated water. The concentration of organic carbon indicates the water’s potential for chlorine demand and for forming THMs. Due to their relative importance; such water quality parameters are measured during planning and design phases of a water treatment plant. However, due to the high analysis cost, they are seldom measured during normal plant operations. In addition, water residence times are not continuously estimated in water systems. First estimates may be garnered by the calculation of the theoretical time (based on flow rate and section volume) or by tracer studies. This appeared to ensure a better identification of the dynamics of the residual chlorine. It has to be noted that the purpose of such a modelling application is to provide a tool aimed at helping operators to control the chlorine dose on-line, thus based on information easily generated either by on-line monitors or manually at very high frequency. The use of water quality and operational parameters for which information cannot be generated continuously or at relatively high frequency is not feasible in such an approach, even if those parameters influence residual chlorine depletion in distribution systems.

RESULTS AND DISCUSSION:

Data Analysis:

Modelling strategy consists of the estimation of parameters Ai and B1j to Bnj of the ARX structure and weights (w_k) for the connection links of the ANN structure. For the ARX model, the number of parameters to be estimated depends on p, q and n. For the ANN model, it depends on p, q, and n and on the number of elements in the hidden layers, h, which determines the number of link connections in the model. For the ARX model, parameter estimation is accomplished using the least-square (LS) algorithm for system identification (Ljung, 1987). Equation derives by linear regression shown in equation- 3. Graph shown in fig 2

(3)

Estimation of weights for the ANN model is based on the back-propagation learning algorithm. An identical model development process is applied for both source-1 and 2. For both source, databases are separated into different sets, for model calibration and for model verification. The first step in modelling consisted of identifying the significant input variables which affect the variation of the output. ANN modelling graph between field data and model data is shown in fig 3. An evaluation of the three error criteria presented in Table 2.

Table-2. Average Errors for Model Forecasting (Using the Verification Database)

Model	%Error
ARX	10.7
ANN	8.6

Table-3. Result Validation

Input Parameters				Output Parameters		Percentage difference
pH	TOC	Chlorine dose	chlorine residuals	THMs from fields	THMs from ANN
0.75556	0.40300	0.71623	0.52363	0.08797	0.08000	9.05991
0.67778	0.31125	0.65679	0.49939	0.05772	0.05110	11.46916
0.91111	0.93275	0.90057	0.92505	0.68622	0.67778	1.22993
0.75556	0.40300	0.71623	0.54582	0.09765	0.08744	10.45571
0.67778	0.31100	0.65679	0.49532	0.05653	0.05018	11.23297
0.75556	0.40300	0.71623	0.52281	0.08763	0.07973	9.01518
0.75556	0.40300	0.71623	0.53198	0.09151	0.08273	9.59458
0.67778	0.40300	0.65679	0.51792	0.07690	0.07474	2.80884
0.67778	0.28600	0.65679	0.47902	0.04933	0.04299	12.85222
						8.6

When evaluating the model’s performance using slightly more detail a better accuracy is observed for the ANN model (Table 3). For extreme values of THMs, the use of a non-linear model instead of a linear model improves performance considerably. This suggests that an ANN has a greater capability than the ARX model for identifying the relations between input and output variables for operational conditions producing low and high values of THMs. Two distinct hypotheses could potentially explain this fact. First, evolution of THMs may be more complex for those conditions; this may be the case, for example, when the concentration of organics in water increases rapidly or when operators significantly increase chlorine doses at the treatment facility. The second hypothesis for explaining the results is related to the quality of the available database. It is possible that only one part of the data, that is related to extreme values of THMs may allow for representation of the non-linearity of the treatment process under study.

Fig-2. Graph between Actual Vs Predicted THM by Linear Regression

Fig-3. Graph between Actual Vs Predicted THM by ANN

CONCLUSION:

The results of this investigation demonstrate that there exists an interesting potential for empirical-based modelling (linear and non-linear) in identifying the patterns of evolution of THMs in drinking water.

It has been established that the evolution of THMs of both sources. The complexity of the THMs depends on diverse operational and water quality parameters. In order to identify the dynamics of such a process using an empirical model, data series to be used must, as much as possible, represent this complexity. However, the adequate information required to build the models (the number of exogenous variables to be used and the order associated with them) appears to be case-by-case. Modelling results obtained with two water systems reveal that, given the selected databases, the dynamics of only one of the systems (source 1) could be adequately identified using a model structure with exogenous variables. The selection of a linear or a non-linear structure depends on striking a balance between the complexity of the modelling process and the gain in performance. For example, for source 1, the application of an ANN model appears more attractive than the application of a linear model within periods of high and low peaks of residual chlorine concentrations (which are generally related to important changes in operational and water quality parameters).

In both cases under study, the ANN model demonstrated adequate results. The final goal in modelling THM evolution, however, is to forecast the phenomena given the variation in operational and quality variables which are deterministically significant. Selected models must preferably allow the undertaking of a sensitivity analysis to evaluate the influence of the variation of input variables in regard to the output variable. Considering the quality of data available, only the model developed for case 1 could be used for this purpose. The results obtained in this investigation are encouraging, but are not totally conclusive. The applications developed are limited to THM forecasting at one step in the future. For the purposes of disinfection management in water treatment, forecasting at more than one step in the future is very useful. However, to undertake such an application would require data which are more representative. The feasibility of incorporating a tool for THM forecasting based on empirical models depends, first, on the availability and the representativeness of information about the operational and water quality parameters, and, second, on the degree of difficulty of updating the model.

In order to generate representative data to build THM models and to study the conditions which affect the variability of their forecasting performance, with these data, we will be able to explore new possibilities for simulating THM evolution using empirical modelling approaches.

ACKNOWLEDGEMENTS:

The data for model development has been taken from the research paper of Abdullah et al. (2003). Author acknowledges them for this. We are also thankful to the Chemical Engineering department, Government Engineering collage, Ujjain for the support and help.

REFERENCES:

1. Abdullah P, Yew CH and Ramli S. Formation, modelling and validation of trihalomethanes (THM) in Malaysian drinking water: a case study in the districts of Tampin, Negeri Sembilan and Sabak Bernam, Selangor, Malasiya, Water Research Elsevier Ltd. 2003; 37: 4637-4644.

2. De Leer EWB, Sinninghe DJS, Erkelens C, and de Galan L. Identification of intermediates leading to chloroform and c-4 diacids in the chlorination of humic acid. Environ. Sci. Technol., 1985; 512-522.

3. Fausett L. Fundamental of neural networks, Prentice-hall, Englewood cliffs, N.J. 1994.

4. Geem ZW. Window-based decision support system for the water pipe condition assessment using artificial neural network, in: World Water and Environmental Resources Congress, American Society of Civil Engineers, Philadelphia, PA, United States. 2003; pp. 2027–2032.

5. Hammerstrom D. Neural networks at work. IEEE Spectrum, June, 1993; 26–32.

6. Haykin S. Neural Networks: A comprehensive foundation, Macmillan, New York. 1994.

7. Jain A. Short-term water demand forecast modelling techniques—Conventional methods versus AI American Water Works Association. 2002; 94 (7) 64–72.

8. Ljung L. System Identification: Theory for the User. Englewood Cliffs, New Jersey. 1987.

9. Ljung L. Issues in system identification. IEEE control systems. 1991; 11: 25–29.

10. Master T. Advance algorithm of neural networks, Wiely, New York. 1995.

11. Patierson D. Artificial neural networks, Prentice-hall, Upper Saddle River, N.J. 1966.

12. Reckhow DA., Singer PC. and Malcolm RL. Chlorination of humic materials: by-product formation and chemical interpretations. Environ. Sci. Technol. 1990; 24:1655-1664.

13. Reuber MD. Carcinogenicity of chloroform. Environmental Health Perspectives. 1979; 31, 171.

14. Rodriguez MJ, Milot J, and Serodes JB. Predicting trihalomethanes formation in chlorinated waters using multivariate regression and neural networks, Journal of Water Supply Research and Technology—Aqua. 2003; 52 (3) 199–215.

15. Rook JJ. Formation of haloforms during chlorination of natural waters. Water Treatment and Examination. 1974; 23: 234–243.

16. Rumelhart DE, Widrow B and Lehr MA. The basic ideas in neural networks. Communications of the ACM. 1994; 37 (3), 87–92.

17. Rurnelhart DE, Hinton GE, and Williams RJ. Learning representations by back propagation error. Nature (Landon). 1986; 323: 533-536.

18. Serodes JB and Rodriguez MJ. Predicting residual chlorine evolution in storage tanks within distribution systems: Application of a neural network approach, Journal of Water Supply Research and Technology—Aqua. 1996; 45 (2) 57–66.

19. Soderstrom T and Stoica PG. System identification. Englewood Cliffs, New Jersey. 1989.

20. Stevens AA, Moore LA, Slocum CJ, Smith BL, Seeger DR and Ireland JC. Chlorinated Modelling brominated trihalomethane formation 3567 humic acid mixtures: criteria for detection of disinfection by products in drinking water. In eds I. H. Su.et and P. MacCarthy, Aquatic Humic Substances: Influence on Fate and Treatment of Pollutants Adv. Chem. Series, 219. American Chemical Society, Denver, CO. 1989; p. 684.

21. Werbos PJ. Beyond regression: new tools for prediction and analysis in the behavioural sciences. PhD thesis, Harvard Univ., Boston. 1974.

Received on 03.07.2010 Modified on 25.08.2010

Asian J. Research Chem. 4(4): April 2011; Page 537-541